-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export bagit #27
Export bagit #27
Conversation
c20a4c1
to
34dd3ee
Compare
invenio_sipstore/models.py
Outdated
@@ -71,24 +65,35 @@ class SIP(db.Model, Timestamp): | |||
agent = db.Column(JSONType, default=lambda: dict(), nullable=False) | |||
"""Agent information regarding given SIP.""" | |||
|
|||
archivable = db.Column( | |||
db.Boolean(name='ck_sipstore_archivable'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we follow simple naming as in here: https://github.com/inveniosoftware/invenio-files-rest/blob/master/invenio_files_rest/models.py#L234
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constraint name in sqlite3:
CONSTRAINT ck_sipstore_sip_ck_sipstore_archivable CHECK (archivable IN (0, 1)),
invenio_sipstore/models.py
Outdated
db.ForeignKey(SIP.id, name='fk_sipmetadata_sip_id')) | ||
"""Id of SIP.""" | ||
|
||
format = db.Column(db.String(7), nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be longer string, given that 7 char string would save us anything w.r.t. to implementation in the DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took it from here: https://github.com/inveniosoftware/invenio-sipstore/pull/27/files/34dd3ee95befda08384a34bc26818fb05fa66937#diff-a6c01358c922643bed721b952a78b8c6L59
you should just store 'json' or 'marcxml'...
invenio_sipstore/models.py
Outdated
@@ -155,6 +175,31 @@ def validate_key(self, filepath, filepath_): | |||
"""Relation to the SIP along which given file was submitted.""" | |||
|
|||
|
|||
class SIPMetadata(db.Model, Timestamp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this object meant to store different metadata formats per sip, or should we also use it to store metadata edits on the record? If so, how do we keep track of versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change the content here, just added a way to be able to store multiple metadata. @lnielsen may have an answer?
:return: a dict with final relative path as keys and content as value. | ||
:rtype: dict | ||
""" | ||
def get_extention(format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_extention
-> get_extension
@@ -71,24 +65,35 @@ class SIP(db.Model, Timestamp): | |||
agent = db.Column(JSONType, default=lambda: dict(), nullable=False) | |||
"""Agent information regarding given SIP.""" | |||
|
|||
archivable = db.Column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is flag archivable
needed? Should this be a state = {new, non_archivable, archived}
invenio_sipstore/api.py
Outdated
:param bool create_sip_files: If True the SIPFiles will be created. | ||
:returns: RecordSIP object. | ||
:rtype: :py:class:`invenio_sipstore.api.RecordSIP` | ||
""" | ||
files = record.files if create_sip_files else None | ||
metadata = {'json': json.dumps(record.dumps())} | ||
mtype = SIPMetadataType.get_from_schema(record['$schema']) | ||
metadata = {mtype.name: json.dumps(record.dumps())} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use the future "filename" field from the metadatatype table instead of mtype.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some other combination of fields would be more intuitive (technically filename
is used as a slug/tag). Proposals for example Invenio JSON Record Metadata v1.0.0
- invenio-record-json-v1.0.0
:
name
-tag
/slug
/code
title
/description
-name
invenio_sipstore/models.py
Outdated
"""ID of the SIPMetadataType object.""" | ||
|
||
name = db.Column(db.String(255), nullable=False, unique=True) | ||
"""The name of type of metadata (i.e. 'zenodo-json-1.0.0').""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably make it zenodo-agnostic. ...(i.e. `invenio-record-v1.0.0')
invenio_sipstore/models.py
Outdated
""" | ||
|
||
schema = db.Column(db.String(1024), nullable=True, unique=True) | ||
"""Path to a schema that describes the metadata (json or xml schema).""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably say "URI to a schema..."
* Adds a celery task that generates a BagIt file with the SIP content. Signed-off-by: Javier Delgado <javier.delgado.fernandez@cern.ch>
- added class `SIPMetadata` to be able to have multiple metadata - added attribute `archive` (boolean) to say if the content should be archived or not
- updated the model views to integrate the new fields - added the model view for SIPMetadata with links to SIP
- add a signal when a SIP is created from the API - 2 API classes: SIP and RecordSIP to manage the models based on Zenodo: https://github.com/zenodo/zenodo/blob/master/zenodo/modules/sipstore/api.py - add a function to automatically find the current storage location of a SIPFile - add a config variable to generate agent of the SIPs - updated tests
- base class for archivers - refactor of BagItArchiver
- Added SIPMetadataType for the type of metadata. This class describes the format of metadata, with an eventual schema to validate it if it exists. - refactor the code to integrate this change
* Removes unique constraint from SIPMetadataType.title and sets it on SIPMetdataType.name. Signed-off-by: Alexander Ioannidis <a.ioannidis@cern.ch>
@krzysztof Can you take a last look and merge? |
Integrate the work from @JavierDelgadoFernandez into the sipstore, in #10 (close #10).
refactor the code so it fits the new philosophy: the sipstore does nothing, it is managed by invenio-archivematica which create the exports of the SIPs. Thus, no need for task or whatever.
Also made the archiver more generic: there is a base class so we can create whatever export we want.
IMPORTANT NOTE This PR should be merged AFTER #26 as it depends on it.
If you look at the changelog, you'll see the diff of both PR. To see just the changes with #26, see https://github.com/remileduc/invenio-sipstore/pull/1/files